CCG Parsing and Multiword Expressions

نویسنده

  • Miryam de Lhoneux
چکیده

This thesis presents a study about the integration of information about Multiword Expressions (MWEs) into parsing with Combinatory Categorial Grammar (CCG). We build on previous work by Nivre and Nilsson (2004) and by Korkontzelos and Manandhar (2010) who have shown the benefit of adding information about MWEs to syntactic parsing by implementing a similar pipeline with CCG parsing. More specifically, we collapse MWEs to one token in training and test data in CCGbank, a corpus automatically created by Hockenmaier (2003) which contains sentences annotated with CCG derivations. Our collapsing algorithm however can only deal with MWEs when they form a constituent in the data which is one of the limitations of our approach. We study the effect of collapsing training and test data. A parsing effect can be obtained if collapsed data help the parser in its decisions and a training effect can be obtained if training on the collapsed data improves results. We also collapse the gold standard and show that our model significantly outperforms the baseline model on our gold standard, which indicates that there is a training effect. We show that the baseline model performs significantly better on our gold standard when the data are collapsed before parsing than when the data are collapsed after parsing which indicates that there is a parsing effect. We show that these results can lead to improved performance on the non-collapsed standard benchmark although we fail to show that it does so significantly. We conclude that despite the limited settings, there are noticeable improvements from using MWEs in parsing. We discuss ways in which the incorporation of MWEs into parsing can be improved and hypothesize that this will lead to more substantial results. We finally show that turning the MWE recognition part of the pipeline into an experimental part is a useful thing to do as we obtain different results with different recognizers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Verb-Particle Constructions into CCG Parsing

Despite their prevalence in the English language, multiword expressions like verb-particle constructions (VPCs) are often poorly handled by NLP systems. This problem is partly due to inadequacies in existing corpora; the primary corpus for CCG-oriented work, CCGbank, does not account for VPCs at all, and is inconsistent in its handling of them. In this paper, we apply some corrective transforma...

متن کامل

Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly, we show th...

متن کامل

Can Recognising Multiword Expressions Improve Shallow Parsing?

There is significant evidence in the literature that integrating knowledge about multiword expressions can improve shallow parsing accuracy. We present an experimental study to quantify this improvement, focusing on compound nominals, proper names and adjectivenoun constructions. The evaluation set of multiword expressions is derived from WordNet and the textual data are downloaded from the web...

متن کامل

Parsing Models for Identifying Multiword Expressions

Multiword expressions lie at the syntax/semantics interface and have motivated alternative theories of syntax like Construction Grammar. Until now, however, syntactic analysis and multiword expression identification have been modeled separately in natural language processing. We develop two structured prediction models for joint parsing and multiword expression identification. The first is base...

متن کامل

A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper

Multiword expressions are groups of words acting as a morphologic, syntactic and semantic unit in linguistic analysis. Verbal multiword expressions represent a subgroup of multiword expressions, namely that in which a verb is the syntactic head of the group considered in its canonical (or dictionary) form. All multiword expressions are a great challenge for natural language processing, but the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1505.04420  شماره 

صفحات  -

تاریخ انتشار 2015